{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 应用梯度累积算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 概述\n",
    "\n",
    "本教程介绍梯度累积的训练方式，目的是为了解决由于内存不足导致某些大型网络无法训练大`batch_size`的问题。  \n",
    "传统的训练方式是每次计算得到loss和梯度后，直接用所得梯度对参数进行更新。与传统的训练方式不同，梯度累积引入`mini_batch`的概念，首先对每个`mini_batch`的数据计算loss和梯度，但不立即更新模型参数，而是先对所得梯度进行累加，然后在指定数量（N）个`mini_batch`之后，用累积后的梯度更新网络参数。下次训练前清空过往累积梯度后重新累加，如此往复。  \n",
    "最终目的是为了达到跟直接用N个mini_batch数据训练几乎同样的效果。 \n",
    "本例将在MindSpore中应用梯度累积算法，实现对模型的训练。  \n",
    "体验过程如下：\n",
    "\n",
    "1. 数据准备。\n",
    "2. 定义深度神经网络。\n",
    "3. 训练函数并实现定义梯度累积算法。\n",
    "4. 调用自定义训练函数进行训练。\n",
    "5. 使用训练保存的模型参数进行验证。\n",
    "\n",
    "> 本文档适用于GPU环境。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据准备\n",
    "\n",
    "下载MNIST_Data数据集并解压到指定位置，执行如下命令："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2020-12-14 15:24:22--  https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/datasets/MNIST_Data.zip\n",
      "Resolving proxy-notebook.modelarts-dev-proxy.com (proxy-notebook.modelarts-dev-proxy.com)... 192.168.0.172\n",
      "Connecting to proxy-notebook.modelarts-dev-proxy.com (proxy-notebook.modelarts-dev-proxy.com)|192.168.0.172|:8083... connected.\n",
      "Proxy request sent, awaiting response... 200 OK\n",
      "Length: 10754903 (10M) [application/zip]\n",
      "Saving to: ‘MNIST_Data.zip’\n",
      "\n",
      "MNIST_Data.zip      100%[===================>]  10.26M  --.-KB/s    in 0.07s   \n",
      "\n",
      "2020-12-14 15:24:22 (154 MB/s) - ‘MNIST_Data.zip’ saved [10754903/10754903]\n",
      "\n",
      "Archive:  MNIST_Data.zip\n",
      "   creating: ./datasets/MNIST_Data/test/\n",
      "  inflating: ./datasets/MNIST_Data/test/t10k-images-idx3-ubyte  \n",
      "  inflating: ./datasets/MNIST_Data/test/t10k-labels-idx1-ubyte  \n",
      "   creating: ./datasets/MNIST_Data/train/\n",
      "  inflating: ./datasets/MNIST_Data/train/train-images-idx3-ubyte  \n",
      "  inflating: ./datasets/MNIST_Data/train/train-labels-idx1-ubyte  \n",
      "./datasets/MNIST_Data/\n",
      "├── test\n",
      "│   ├── t10k-images-idx3-ubyte\n",
      "│   └── t10k-labels-idx1-ubyte\n",
      "└── train\n",
      "    ├── train-images-idx3-ubyte\n",
      "    └── train-labels-idx1-ubyte\n",
      "\n",
      "2 directories, 4 files\n"
     ]
    }
   ],
   "source": [
    "!wget -N https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/datasets/MNIST_Data.zip\n",
    "!unzip -o MNIST_Data.zip -d ./datasets/\n",
    "!tree ./datasets/MNIST_Data/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义数据集增强函数create_dataset，调用该函数对MNIST原始训练数据集60000张$28\\times28$的图片增强为1875个batch，每个batch张量为`(32,1,32,32)`的训练数据集。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mindspore.dataset.vision.c_transforms as CV\n",
    "import mindspore.dataset.transforms.c_transforms as C\n",
    "from mindspore.dataset.vision import Inter\n",
    "from mindspore import dtype as mstype\n",
    "import mindspore.dataset as ds\n",
    "\n",
    "\n",
    "def create_dataset(data_path, batch_size=32, repeat_size=1,\n",
    "                   num_parallel_workers=1):\n",
    "    # define dataset\n",
    "    mnist_ds = ds.MnistDataset(data_path)\n",
    "\n",
    "    # define some parameters needed for data enhancement and rough justification\n",
    "    resize_height, resize_width = 32, 32\n",
    "    rescale = 1.0 / 255.0\n",
    "    shift = 0.0\n",
    "    rescale_nml = 1 / 0.3081\n",
    "    shift_nml = -1 * 0.1307 / 0.3081\n",
    "\n",
    "    # according to the parameters, generate the corresponding data enhancement method\n",
    "    c_trans = [\n",
    "        CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR),\n",
    "        CV.Rescale(rescale_nml, shift_nml),\n",
    "        CV.Rescale(rescale, shift),\n",
    "        CV.HWC2CHW()\n",
    "    ]\n",
    "    type_cast_op = C.TypeCast(mstype.int32)\n",
    "\n",
    "    # using map to apply operations to a dataset\n",
    "    mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns=\"label\", num_parallel_workers=num_parallel_workers)\n",
    "    mnist_ds = mnist_ds.map(operations=c_trans, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n",
    "\n",
    "    # process the generated dataset\n",
    "    buffer_size = 10000\n",
    "    mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)\n",
    "    mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)\n",
    "    mnist_ds = mnist_ds.repeat(repeat_size)\n",
    "\n",
    "    return mnist_ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义深度神经网络\n",
    "\n",
    "本例采用LeNet5训练网络对数据集进行训练，其构造方式如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mindspore.nn as nn\n",
    "from mindspore.common.initializer import Normal\n",
    "\n",
    "class LeNet5(nn.Cell):\n",
    "    \"\"\"Lenet network structure.\"\"\"\n",
    "    # define the operator required\n",
    "    def __init__(self, num_class=10, num_channel=1):\n",
    "        super(LeNet5, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')\n",
    "        self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')\n",
    "        self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))\n",
    "        self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))\n",
    "        self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))\n",
    "        self.relu = nn.ReLU()\n",
    "        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n",
    "        self.flatten = nn.Flatten()\n",
    "\n",
    "    # use the preceding operators to construct networks\n",
    "    def construct(self, x):\n",
    "        x = self.max_pool2d(self.relu(self.conv1(x)))\n",
    "        x = self.max_pool2d(self.relu(self.conv2(x)))\n",
    "        x = self.flatten(x)\n",
    "        x = self.relu(self.fc1(x))\n",
    "        x = self.relu(self.fc2(x))\n",
    "        x = self.fc3(x) \n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义Model函数并在其中进行梯度累积定义\n",
    "\n",
    "梯度累积计算在Model函数中，这里对Model函数的原始代码进行重构。\n",
    "\n",
    "重构中需涉及重构的方法主要有五点：\n",
    "\n",
    "1. 定义梯度累积方法。\n",
    "2. 定义前向反向传播方法。\n",
    "3. 定义权重更新方法。\n",
    "4. 定义梯度累积清除方法。\n",
    "5. 定义模型训练执行器。\n",
    "\n",
    "具体实现如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 定义梯度累积方法\n",
    "\n",
    "需要定义梯度累积的计算方式，并将计算方式注册到计算图中，若不进行注册，计算方法将不能在`nn.Cell`中构建计算图。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mindspore.ops as ops\n",
    "\n",
    "_sum_op = ops.MultitypeFuncGraph(\"grad_sum_op\")\n",
    "_clear_op = ops.MultitypeFuncGraph(\"clear_op\")\n",
    "\n",
    "\n",
    "@_sum_op.register(\"Tensor\", \"Tensor\")\n",
    "def _cumulative_grad(grad_sum, grad):\n",
    "    \"\"\"Apply grad sum to cumulative gradient.\"\"\"\n",
    "    add = ops.AssignAdd()\n",
    "    return add(grad_sum, grad)\n",
    "\n",
    "\n",
    "@_clear_op.register(\"Tensor\", \"Tensor\")\n",
    "def _clear_grad_sum(grad_sum, zero):\n",
    "    \"\"\"Apply zero to clear grad_sum.\"\"\"\n",
    "    success = True\n",
    "    success = ops.depend(success, ops.assign(grad_sum, zero))\n",
    "    return success"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`_cumulativa_gard`：梯度累积方法，将grad值加到`grad_sum`中，后续计算过程中作用是将`mini_batch`计算出的grad值添加到`grad_sum`中。  \n",
    "`_clear_grad_sum`：梯度清除方法，后续计算过程中的作用是当累积的梯度值`grad_sum`更新到权重中后，将`grad_sum`值清零。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 定义前向反向传播方法\n",
    "\n",
    "前向传播：利用训练前的模型函数，载入数据集中的数据，计算出loss值的过程。  \n",
    "反向传播：利用loss值和载入的数据，通过优化器函数计算出梯度值，并将梯度值更新到模型函数的权重中的过程。  \n",
    "这两个过程将在`TrainForwardBackward`中定义。  \n",
    "MindSpore采用继承`nn.Cell`的方法，并将整体的计算过程在`construct`中实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mindspore.nn import Cell\n",
    "\n",
    "class TrainForwardBackward(Cell):\n",
    "    def __init__(self, network, optimizer, grad_sum, sens=1.0):\n",
    "        super(TrainForwardBackward, self).__init__(auto_prefix=False)\n",
    "        self.network = network\n",
    "        self.network.set_grad()\n",
    "        self.network.add_flags(defer_inline=True)\n",
    "        self.weights = ParameterTuple(network.trainable_params())\n",
    "        self.optimizer = optimizer\n",
    "        self.grad_sum = grad_sum\n",
    "        self.grad = ops.GradOperation(get_by_list=True, sens_param=True)\n",
    "        self.sens = sens\n",
    "        self.hyper_map = ops.HyperMap()\n",
    "\n",
    "    def construct(self, *inputs):\n",
    "        weights = self.weights\n",
    "        loss = self.network(*inputs)\n",
    "        sens = ops.Fill()(ops.DType()(loss), ops.Shape()(loss), self.sens)\n",
    "        grads = self.grad(self.network, weights)(*inputs, sens)\n",
    "        return ops.depend(loss, self.hyper_map(ops.partial(_sum_op), self.grad_sum, grads))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`weights`：即网络中的权重参数。  \n",
    "`loss`：当前网络参数载入训练数据后的损失值。  \n",
    "`sens`：创建一个与loss相同类型和张量，将数值1填充其中。  \n",
    "`grads`：计算出本次`mini_batch`的梯度值。  \n",
    "`ops.depend`：使用前面的`loss`方法将loss值计算出来。\n",
    "\n",
    "此方法定义了模型训练过程中前向传播和方向传播的具体过程，并且可以保存出所有权重的参数，计算出当前模型的权重参数下的loss值。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 定义权重更新方法\n",
    "\n",
    "执行优化权重的方法，即将`grad_sum`更新到权重参数中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TrainOptim(Cell):\n",
    "    def __init__(self, optimizer, grad_sum):\n",
    "        super(TrainOptim, self).__init__(auto_prefix=False)\n",
    "        self.optimizer = optimizer\n",
    "        self.grad_sum = grad_sum\n",
    "\n",
    "    def construct(self):\n",
    "        return self.optimizer(self.grad_sum)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 定义清除累积梯度的方法\n",
    "\n",
    "当累积的梯度`grad_sum`更新到权重中后，调用本函数将`grad_sum`值清零，再开始下一次梯度累积。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TrainClear(Cell):\n",
    "    def __init__(self, grad_sum, zeros):\n",
    "        super(TrainClear, self).__init__(auto_prefix=False)\n",
    "        self.grad_sum = grad_sum\n",
    "        self.zeros = zeros\n",
    "        self.hyper_map = ops.HyperMap()\n",
    "\n",
    "    def construct(self):\n",
    "        seccess = self.hyper_map(ops.partial(_clear_op), self.grad_sum, self.zeros)\n",
    "        return seccess"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 定义模型训练执行器\n",
    "\n",
    "在`GradientAccumulation`定义前向和反向以及梯度累积的执行过程。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import mindspore.nn as nn\n",
    "from mindspore import ParameterTuple, context, DatasetHelper\n",
    "from mindspore import save_checkpoint\n",
    "\n",
    "\n",
    "class GradientAccumulation:\n",
    "    def __init__(self, network, loss_fn, optimizer):\n",
    "        self._network = network\n",
    "        self._loss_fn = loss_fn\n",
    "        self._optimizer = optimizer\n",
    "\n",
    "        params = self._optimizer.parameters\n",
    "        self._grad_sum = params.clone(prefix=\"grad_sum\", init='zeros')\n",
    "        self._zeros = params.clone(prefix=\"zeros\", init='zeros')\n",
    "        self._train_forward_backward = self._build_train_forward_backward_network()\n",
    "        self._train_optim = self._build_train_optim()\n",
    "        self._train_clear = self._build_train_clear()\n",
    "\n",
    "    def _build_train_forward_backward_network(self):\n",
    "        \"\"\"Build forward and backward network\"\"\"\n",
    "        network = self._network\n",
    "        network = nn.WithLossCell(network, self._loss_fn)\n",
    "        loss_scale = 1.0\n",
    "        network = TrainForwardBackward(network, self._optimizer, self._grad_sum, loss_scale).set_train()\n",
    "        return network\n",
    "\n",
    "    def _build_train_optim(self):\n",
    "        \"\"\"Build optimizer network\"\"\"\n",
    "        network = TrainOptim(self._optimizer, self._grad_sum).set_train()\n",
    "        return network\n",
    "\n",
    "    def _build_train_clear(self):\n",
    "        \"\"\"Build clear network\"\"\"\n",
    "        network = TrainClear(self._grad_sum, self._zeros).set_train()\n",
    "        return network\n",
    "\n",
    "    def train_process(self, epoch, train_dataset, mini_steps=None):\n",
    "        \"\"\"\n",
    "        Training process. The data would be passed to network directly.\n",
    "        \"\"\"\n",
    "        dataset_helper = DatasetHelper(train_dataset, dataset_sink_mode=False, epoch_num=epoch)\n",
    "\n",
    "        for i in range(epoch):\n",
    "            step = 0\n",
    "            for k, next_element in enumerate(dataset_helper):\n",
    "                loss = self._train_forward_backward(*next_element)\n",
    "                if (k + 1) % mini_steps == 0:\n",
    "                    step += 1\n",
    "                    print(\"epoch:\", i + 1, \"step:\", step, \"loss is \", loss)\n",
    "                    self._train_optim()\n",
    "                    self._train_clear()\n",
    "\n",
    "            train_dataset.reset()\n",
    "\n",
    "        save_checkpoint(self._train_forward_backward, \"gradient_accumulation.ckpt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`train_process`：构建训练执行过程，并将梯度累积的方法在其中实现，即每`mini_steps`个`batch`数据训练完成后更新一次权重参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 执行训练\n",
    "\n",
    "执行训练过程，类似快速入门案例，将损失函数`SoftmaxCrossEntropyWithLogits`，优化器函数`Momentum`和深度网络`LeNet5`传入，自定义模型训练函数`GradientAccumolation`，并调用`train_process`方法，使用数据进行训练。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============== Starting Training ==============\n",
      "epoch: 1 step: 1 loss is  2.302572\n",
      "epoch: 1 step: 2 loss is  2.3027077\n",
      "epoch: 1 step: 3 loss is  2.3026032\n",
      "epoch: 1 step: 4 loss is  2.3029802\n",
      "epoch: 1 step: 5 loss is  2.3009882\n",
      "epoch: 1 step: 6 loss is  2.3028584\n",
      "epoch: 1 step: 7 loss is  2.2963173\n",
      "epoch: 1 step: 8 loss is  2.301377\n",
      "epoch: 1 step: 9 loss is  2.3019261\n",
      "... ...\n",
      "epoch: 1 step: 461 loss is  2.2829156\n",
      "epoch: 1 step: 462 loss is  2.2586172\n",
      "epoch: 1 step: 463 loss is  2.2446578\n",
      "epoch: 1 step: 464 loss is  2.1804438\n",
      "epoch: 1 step: 465 loss is  2.1868634\n",
      "epoch: 1 step: 466 loss is  2.118839\n",
      "epoch: 1 step: 467 loss is  2.1144428\n",
      "epoch: 1 step: 468 loss is  1.94902\n",
      "epoch: 2 step: 1 loss is  1.9981135\n",
      "epoch: 2 step: 2 loss is  2.0984964\n",
      "epoch: 2 step: 3 loss is  2.0167308\n",
      "epoch: 2 step: 4 loss is  2.0224195\n",
      "epoch: 2 step: 5 loss is  2.0156221\n",
      "epoch: 2 step: 6 loss is  1.9364308\n",
      "epoch: 2 step: 7 loss is  1.8101931\n",
      "... ...\n",
      "epoch: 2 step: 459 loss is  0.12907082\n",
      "epoch: 2 step: 460 loss is  0.15356739\n",
      "epoch: 2 step: 461 loss is  0.36636132\n",
      "epoch: 2 step: 462 loss is  0.2972299\n",
      "epoch: 2 step: 463 loss is  0.035830393\n",
      "epoch: 2 step: 464 loss is  0.3594339\n",
      "epoch: 2 step: 465 loss is  0.0087479465\n",
      "epoch: 2 step: 466 loss is  0.16021682\n",
      "epoch: 2 step: 467 loss is  0.11816633\n",
      "epoch: 2 step: 468 loss is  0.019440759\n",
      "epoch: 3 step: 1 loss is  0.0047739483\n",
      "epoch: 3 step: 2 loss is  0.03690074\n",
      "epoch: 3 step: 3 loss is  0.38832387\n",
      "epoch: 3 step: 4 loss is  0.121167235\n",
      "epoch: 3 step: 5 loss is  0.097194746\n",
      "epoch: 3 step: 6 loss is  0.047661886\n",
      "epoch: 3 step: 7 loss is  0.13189279\n",
      "... ...\n",
      "epoch: 3 step: 455 loss is  0.26175526\n",
      "epoch: 3 step: 456 loss is  0.028598795\n",
      "epoch: 3 step: 457 loss is  0.060193256\n",
      "epoch: 3 step: 458 loss is  0.04647294\n",
      "epoch: 3 step: 459 loss is  0.31234825\n",
      "epoch: 3 step: 460 loss is  0.07622443\n",
      "epoch: 3 step: 461 loss is  0.04356075\n",
      "epoch: 3 step: 462 loss is  0.02148334\n",
      "epoch: 3 step: 463 loss is  0.16675451\n",
      "epoch: 3 step: 464 loss is  0.017797818\n",
      "epoch: 3 step: 465 loss is  0.037047308\n",
      "epoch: 3 step: 466 loss is  0.009920539\n",
      "epoch: 3 step: 467 loss is  0.16409619\n",
      "epoch: 3 step: 468 loss is  0.058633693\n"
     ]
    }
   ],
   "source": [
    "if __name__ == \"__main__\":\n",
    "    context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n",
    "    ds_train_path = \"./datasets/MNIST_Data/train/\"\n",
    "    ds_train = create_dataset(ds_train_path, 32)\n",
    "\n",
    "    net = LeNet5(10)\n",
    "    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction=\"mean\")\n",
    "    net_opt = nn.Momentum(net.trainable_params(), 0.01, 0.9)\n",
    "    model = GradientAccumulation(net, net_loss, net_opt)\n",
    "\n",
    "    print(\"============== Starting Training ==============\")\n",
    "    model.train_process(3, ds_train, mini_steps=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本例中采用了累积梯度为`mini_steps=4`，即每训练4个batch的数据，进行一次权重参数的更新。最后在目录中保存了模型的权重参数文件`gradient_accumulate.ckpt`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 验证累积梯度训练出的模型精度\n",
    "\n",
    "载入累积梯度训练结束后保存的模型参数`gradient_accumulation.ckpt`文件到神经网络LeNet5中，同时将其与损失函数（net_loss），优化器（net_opt）放入MindSpore的模型函数Model中，重新结合成完整计算图，输入验证数据集进行验证。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'Accuracy': 0.96875}\n"
     ]
    }
   ],
   "source": [
    "from mindspore.train.serialization import load_checkpoint, load_param_into_net\n",
    "from mindspore import Model\n",
    "from mindspore.nn import Accuracy\n",
    "\n",
    "\n",
    "ds_eval_path = \"./datasets/MNIST_Data/test/\"\n",
    "ds_eval_data = create_dataset(ds_eval_path,32)\n",
    "\n",
    "param_dict = load_checkpoint(\"gradient_accumulation.ckpt\")\n",
    "load_param_into_net(net, param_dict)\n",
    "model = Model(net, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
    "\n",
    "acc = model.eval(ds_eval_data, dataset_sink_mode=False)\n",
    "print(acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "经过验证，使用累积梯度训练方法生成的模型精度大于0.95，此方法训练效果可行。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "MindSpore-1.0.1",
   "language": "python",
   "name": "mindspore-1.0.1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
